Introduction

NSF publishes its past awards data in a structured format that you can download and analyze. However, other funding agencies or foundations may not provide such data. Here, I will show you how to extract past awards data from the internet using the William T. Grant Foundation’s past awards as an example.

Unfortunately, the code here won’t work –as it is– on any other website since all websites have different structures. However, you can use the code and explanation here to get a sense of what you need to do to extract data from other websites.

You can always get more customized support from Github Copilot, which is free for those with university affiliations, or other generative AI tools. However, you would need ask what you need and the errors you come across step-by-step to get the best results.

Extracting Past Awarded Grants Data from the Internet

Let’s start by loading the necessary libraries. If you don’t have these libraries installed, you can install them using the install.packages() function. You can uncomment the install.packages() lines below to install the libraries.

Our base url in this example is William T. Grant Foundation’s Awarded Research Grants Library

As of 2025-05-01, there are 23 pages of past awards data, totaling 441 awards. We will extract all of them.

There are multiple challenges with extracting the data from William T. Grant Foundation’s website. Most importantly, you need to click on each award to get the details. This means that you need to extract the URLs of each award first and then extract the details from each URL.

We will initialize an empty vector to store all URLs.

We will loop through all pages and extract individual URLs. Since we cannot extract 441 URLs manually, we need to find a pattern in the URLs, and put it in a loop.

To get the URLs, you need to inspect the page and find the XPath of the elements that contain the URLs. You can do this by right-clicking on the element that you are interested in and selecting “Inspect” in Chrome. Then, right-click on the element in the “Elements” tab and select “Copy” > “Copy XPath”.

By doing this, I find that the XPath of the URLs, that is the hyperlinked awarded grant title, is //*[@id="library-content"]/div/article/div[2]/ul/li/div/div/a.

I put this information into my loop to extract all URLs.

Now, we have all URLs in the all_urls vector. We can use this vector to extract the details of each award.

Here, we also need to find patterns in the details of each awarded grant. We need to inspect the page and find the XPath of the elements that contain the details. You can do this by right-clicking on the element that you are interested in and selecting “Inspect” in Chrome. Then, right-click on the element in the “Elements” tab and select “Copy” > “Copy XPath”.

Here, we are interested in the title, amount, area, short description, and description of each awarded grant. I find that the XPath of these elements are as follows:

Title: /html/body/div[4]/div/div[1]/div/div/div/div/h1
Amount: /html/body/div[4]/div/div[2]/div/div[2]/div/div[5]/div

Important

I found out that there were NAs when I first ran the code based on these xpaths. I realized that the amount is not always in the same place. Sometimes it is in /html/body/div[4]/div/div[2]/div/div[2]/div/div[6]/div and sometimes in /html/body/div[4]/div/div[2]/div/div[2]/div/div[7]/div.

So, the code below is updated to check if the amount is NA or empty, and if so, it will try the other xpaths.

Area: /html/body/div[4]/div/div[2]/div/div[2]/div/div[4]/a
Short Description: //*[@id='swup']/div[2]/div/div[1]/div/div[1]/p
Description: /html/body/div[4]/div/div[2]/div/div[1]/div/div[2]

We will create a function to extract the details of each award from a single URL.

Extracting detailed information about 441 awards from 441 different urls will take a long time. We can use parallel processing to speed up the process. We will use the furrr package for parallel processing.

We will use the future_map_dfr function from the furrr package to extract the grant information from all URLs in parallel. We will run our custom-defined function extract_grant_info on all URLs in parallel.

Now, we have extracted the details of all 441 awarded grants. We will now make sure that the amount column is numeric by parsing the number from the amount column.

Let’s take a look at the first few rows of the extracted data.

title	amount	area	desc_short	desc
An Educational Game to Support Multilingual Learners’ Reading Comprehension and Science Learning	599413	Reducing Inequality	How can a game-based learning environment improve 5th grade Spanish-speaking students’ science learning?	Multilingual students, especially Spanish-speaking students, are underrepresented in STEM pathways. One contributing factor is a gap in science literacy, or the skills necessary for science comprehension and the motivation to engage with scientific content, between multilingual learners and peers whose first language is English. Technological tools can be a valuable way to support scientific learning, especially given increasing demands on teachers and decreased time dedicated to science in elementary school. This study will adapt and examine Missions with Monty, an online game-based learning environment that has shown promise in improving reading comprehension and science learning for Spanish-speaking students. Nietfeld and colleagues will work with students in eight schools across five counties in North Carolina to iteratively develop and examine language scaffolds that incorporate culturally-relevant characters and examples, as well as translation options, into the game. The team will conduct a quasi-experimental study to compare three versions of the game with different types of scaffolds on scientific learning and reading comprehension. They will also conduct post-study interviews with participating students. Findings will shed light on the promise of a technological tool for improving scientific learning outcomes for multilingual students whose primary language is Spanish.
Facilitating Collaboration Between Social Scientists and Legal Advocates to Improve Uses of Research in Justice-Oriented Court Cases	49678	Use of Research Evidence	How can collaboration between social science researchers and legal advocates be facilitated to produce research evidence that can be used if/when the Supreme Court’s ruling in the Plyler v. Doe (1982) is challenged? How does participation in research co-design improve legal advocates’ use of research and social scientists’ production of research?	In just its last two terms, the Supreme Court has overturned several rulings supporting access and opportunity for marginalized groups, with further challenges looming. One such challenge is to Plyler v. Doe (1982) in which the Court found that a Texas law seeking to exclude undocumented children from public schools was unconstitutional. Extra-legal sources, such as research-based journal articles, are increasingly incorporated into Supreme Court Justice’s opinions and in legal briefs prepared by amici-curiae, or friends of the court. As such, this study will use an embedded case study design to examine the collaborative process that is undertaken as social scientists and legal advocates produce new research evidence for use in a potential challenge to Plyer. Artifacts, observations, and interviews will be collected as participants co-design a research agenda and engage in research together in small groups. Findings will offer tangible resources to support the production of research in justice-oriented court cases, and they will provide important insight into the conditions needed to improve the use of research.
Measuring the Role of Racial Literacy in Promoting Equitable Reading Instruction through the 3Rs	648527	Reducing Inequality	How can racial literacy be developed and measured, and how is it associated with culturally informed teaching practices and student reading outcomes?	Disparities in early reading have profound consequences for children’s life trajectories, especially for Black students. Teachers’ racial literacy, the skills and practices by which teachers understand and navigate the impacts of race and racism in the classroom, is a potential critical lever for reducing inequality in education. Despite a strong theoretical base, there is a dearth of research around the measurement of racial literacy. The proposed project builds on an innovative literacy initiative, the 3Rs (Reading, Racial Equity, Relationships), a system-based program that aims to improve early literacy outcomes, particularly for Black students, that uses high-quality racially affirming picture books within communities of practice to develop teachers’ racial literacy and promote equitable reading practices. The team will use a three-year, mixed-method multi-phase study to test existing theories of racial literacy and examine its associations with teaching practices and student literacy outcomes. Findings will provide a well-theorized and quantitatively tested measure of racial literacy, whose links to practice and student outcomes can be tested systematically in future research.
Fostering Belonging in School for Black and Latinx Students: The Pathway from Teachers’ Professional Learning on Asset-based Pedagogy to Students’ Social-Emotional Experiences	399996	Reducing Inequality	How does teachers’ professional learning about asset-based pedagogy shape Black and Latinx students’ belonging and academic development in different district contexts?	Research shows that Black and Latinx students disproportionately experience lower feelings of belonging in school compared to White peers, with negative consequences for their academic trajectories. School districts are increasingly turning to asset-based pedagogy (ABP), which leverages students’ cultural identities and experiences as educational strengths, to more equitably support students’ sense of belonging. This study will examine how incorporating a validated student survey into APB shapes teacher practices, knowledge, and beliefs in ways that improve student belonging and learning. The team will collect and analyze quantitative data on teachers’ utilization and experiences of ABP professional learning, resources, and supports, as well as shifts in teachers’ practices. They will also use regression analysis to examine the descriptive relationships between ABP professional learning, teachers’ identity-affirming practices and beliefs, and student belonging and grades. By conducting this work in partnership with Chicago Public Schools (CPS) and the Tucson Unified School District (TUSD), the team will advance theory on the mechanisms through which asset-based pedagogy improves belonging and learning for Black and Latinx students.
Transformative Justice in Schools: Unraveling the Impacts of Restorative Practices on Youth Outcomes and Inequality	541730	Reducing Inequality	Do restorative justice practices reduce racial inequalities in graduation, criminal justice system involvement, and post-secondary outcomes, and if so, how?	Black and Latinx students are disproportionately more likely to be suspended or expelled from school than White students. Further, studies have shown that attending schools with stricter disciplinary policies can lead to adverse long-term outcomes, including lower educational attainment and higher chances of incarceration. Restorative justice practices, which focus on repairing harm and building community, have shown promise in creating a safe school environment that promotes belonging and learning. Building on prior research that found restorative justice practices in Chicago Public Schools (CPS) reduced both exclusionary discipline overall and racial disparities in short-term outcomes, this study will examine the longer-term impacts of these practices, as well as whether positive effects spill over to students’ younger siblings. The team will use student-level two-way fixed effects models to examine the impact of restorative practices exposure on high school graduation, postsecondary enrollment and completion, and criminal legal involvement, and examine how this varies by race, gender, and restorative practice approach. The team also plans to collect supplemental qualitative data from focus groups and interviews with students, restorative practice coaches, and school staff to explore perspectives on the quantitative findings . Through the use of innovative methods, the team will comment on the reach and the longevity of an intervention found to reduce racial inequalities in a prior study.
States as Laboratories: Experimental Trial of the Research-to-Policy Collaboration to Improve State’s Use of Evidence to Benefit Youth	200000	Use of Research Evidence	Does the research-to-policy collaboration model improve policymakers’ use of prevention science research at the state level?	Substance use remains one of the most preventable sources of morbidity and mortality, yet the majority of youth continue to not have access to evidence-based prevention services. While policies can reduce population-level substance misuse, only about a quarter of all prevention-oriented bills explicitly reference evidence-based strategies over the last decade. Findings from recent research have demonstrated the effectiveness of the Research-to-Policy Collaboration (RPC) model for improving research use at the federal level, but evidence of impact on state initiatives remains limited. Crowley and colleagues will leverage a recently funded study by National Institute on Drug Abuse and experimentally evaluate whether the RPC model can improve the use of prevention research in state policymaking. This mixed-methods study will map the state-level policy landscape through qualitative interviews, bill coding, and legislative surveys to assess and quantify value, awareness, and use of substance use prevention research. The findings from this study will reveal whether a state-level knowledge mobilization infrastructure impacts research use in state policymaking, while also addressing substance use among youth and increasing access to evidence-based prevention programs.

Voila! There we have it. We have successfully extracted the past awarded grants data from the William T. Grant Foundation’s website.

Checking the Data for Duplicates

Let’s check the dimensions of the data.

[1] 441   5

We have 441 rows and 5 columns. 441 is the number of awarded grants. 5 is the number of columns we extracted.

We can check the data for duplicates by using the duplicated() function. This function will return a logical vector indicating which rows are duplicates.

[1] 0

We do not have any duplicates in the data. However, if we had duplicates, we could remove them by using the unique() function.

Let’s check the dimensions of the data again.

[1] 441   5

We still have 441 rows and 5 columns.

Saving the Data

We can save the extracted data to a CSV file for further analysis.

You can refer to this guide for further analysis, visualization, or reporting.

If you have any questions or need help, feel free to reach out to me (after you consult with your AI assistant(s) first!).